Method and system for suggesting revisions to an electronic document

ABSTRACT

A method for suggesting revisions to a document-under-analysis from a seed database, the seed database including a plurality of original texts each respectively associated with one of a plurality of final texts, the method for suggesting revisions including selecting a statement-under-analysis (“SUA”), selecting a first original text of the plurality of original texts, determining a first edit-type classification of the first original text with respect to its associated final text, generating a first similarity score for the first original text based on the first edit-type classification, the first similarity score representing a degree of similarity between the SUA and the first original text, selecting a second original text of the plurality of original texts, determining a second edit-type classification of the second original text with respect to its associated final text, generating a second similarity score for the second original text based on the second edit-type classification, the second similarity score representing a degree of similarity between the SUA and the second original text, selecting a candidate original text from one of the first original text and the second original text, and creating an edited SUA (“ESUA”) by modifying a copy of the first SUA consistent with a first candidate final text associated with the first candidate original text.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of U.S. application Ser. No.16/689,469, filed on Nov. 20, 2019, which issues as U.S. Pat. No.10,713,436 on Jul. 14, 2020 and is a continuation of U.S. applicationSer. No. 16/361,781, filed on Mar. 22, 2019, which issued as U.S. Pat.No. 10,515,149 on Dec. 24, 2019 and is a non-provisional of, and claimsthe priority benefit of, U.S. Provisional Application No. 62/650,607,filed on Mar. 30, 2018. Reference is made to U.S. application Ser. No.15/227,093 filed Aug. 3, 2016, which issued as U.S. Pat. No. 10,216,715and is a non-provisional of, and claims the priority benefit of, U.S.Prov. Pat. App. No. 62/200,261 filed Aug. 3, 2015; and U.S. applicationSer. No. 16/197,769, filed on Nov. 21, 2018, which issued as U.S. Pat.No. 10,311,140, which is a continuation of U.S. application Ser. No.16/170,628, filed on Oct. 25, 2018. The aforementioned applications arehereby incorporated by reference in their entirety.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under NSF 16-599, AwardNo. 1721878 awarded by the National Science Foundation. The governmenthas certain rights in the invention.

TECHNICAL FIELD

The embodiments of the invention relate to a method and system forrevising electronic documents, and more particularly, to a method andsystem for suggesting edits to an electronic document. Althoughembodiments of the invention are suitable for a wide scope ofapplications, it is particularly suitable for suggesting revisions toelectronic documents where the suggested revisions are similar to pastrevisions of similar documents.

BACKGROUND

U.S. Pat. No. 10,216,715 contemplates a method and system for suggestingedits to a document by, generally, breaking a document-under-analysis(“DUA”) into many statements-under-analysis (“SUA”) and then comparingthe SUA's against a “seed database” of past edits to determine if theSUA can be edited in the same way. The seed database of past editsincludes “original text” and “final text” representing, respectively, anunedited text and the corresponding edit thereto. The method and systemincludes, generally, calculating a similarity score between the SUA andeach of the “original texts” from the database. For original texts thathave a similarity score that exceed a threshold, the SUA and theoriginal text are “aligned” and the edit from the corresponding “finaltext” is applied to the SUA to produce an edited SUA (“ESUA”). The ESUAcan then be inserted into the DUA in place of the SUA. The SUA andcorresponding ESUA can then be added to the seed database.

SUMMARY OF THE INVENTION

Some techniques contemplate calculating a similarity score in the sameway for each of the original texts and aligning all SUAs andoriginal/final texts in the same way. But a one-size-fits-all approachmay not be optimal.

For example, by calculating a similarity score for all original/finaltexts in the same way, some similarity scores are calculated to be loweven though an objective observer would indicate a high degree ofsimilarity. This can happen, for example, when many words have beendeleted.

Similarly, the effectiveness of applying edits to the SUA is determinedin large part by the alignment of the SUA and the original/final texts.There are many ways to “align” sentences, and some alignments may yieldbetter results for applying edits.

Thus, there is a need to provide a method and system with improvedcalculation of similarity scores and improved alignment of SUAs and theoriginal/final texts. Accordingly, embodiments of the invention aredirected to a method and system for suggesting revisions to anelectronic document that substantially obviates one or more of theproblems due to limitations and disadvantages of the related art.

An object of embodiments of the invention is to provide an improvedsimilarity score for selecting original texts.

Another object of embodiments of the invention is to provide improvedalignment of SUAs and the original/final texts.

Additional features and advantages of embodiments of the invention willbe set forth in the description which follows, and in part will beapparent from the description, or may be learned by practice ofembodiments of the invention. The objectives and other advantages of theembodiments of the invention will be realized and attained by thestructure particularly pointed out in the written description and claimshereof as well as the appended drawings.

To achieve these and other advantages and in accordance with the purposeof embodiments of the invention, as embodied and broadly described, amethod and system for suggesting revisions to an electronic documentincludes selecting a statement-under-analysis (“SUA”), selecting a firstoriginal text of the plurality of original texts, determining a firstedit-type classification of the first original text with respect to itsassociated final text, generating a first similarity score for the firstoriginal text based on the first edit-type classification, the firstsimilarity score representing a degree of similarity between the SUA andthe first original text, selecting a second original text of theplurality of original texts, determining a second edit-typeclassification of the second original text with respect to itsassociated final text, generating a second similarity score for thesecond original text based on the second edit-type classification, thesecond similarity score representing a degree of similarity between theSUA and the second original text, selecting a candidate original textfrom one of the first original text and the second original text, andcreating an edited SUA (“ESUA”) by modifying a copy of the first SUAconsistent with a first candidate final text associated with the firstcandidate original text.

According to some embodiments, a method for suggesting revisions to textdata is provided. The method includes the step of obtaining atext-under-analysis (“TUA”). The method includes the step of obtaining acandidate original text from a plurality of original texts. The methodincludes the step of identifying a first edit operation of the candidateoriginal text with respect to a candidate final text associated with thecandidate original text, the first edit operation having an edit-typeclassification. The method includes the step of selecting an alignmentmethod from a plurality of alignment methods based on the edit-typeclassification of the first edit operation. The method includes the stepof identifying a second edit operation based on the selected alignmentmethod. The method includes the step of creating an edited TUA (“ETUA”)by applying to the TUA the second edit operation.

According to some embodiments, a non-transitory computer readable mediumis provided, the non-transitory computer readable medium storinginstructions configured to cause a computer to perform the method forsuggesting revisions to text data.

According to some embodiments, a system for suggesting revisions to textdata is provided. The system includes a processor and a non-transitorycomputer readable memory coupled to the processor. The processor isconfigured to obtain a text-under-analysis (“TUA”). The processor isconfigured to obtain a candidate original text from a plurality oforiginal texts. The processor is configured to identify a first editoperation of the candidate original text with respect to a candidatefinal text associated with the candidate original text, the first editoperation having an edit-type classification. The processor isconfigured to select an alignment method from a plurality of alignmentmethods based on the edit-type classification of the first editoperation. The processor is configured to identify a second editoperation based on the selected alignment method. The processor isconfigured to create an edited TUA (“ETUA”) by applying to the TUA thesecond edit operation.

It is to be understood that both the foregoing general description andthe following detailed description are exemplary and explanatory and areintended to provide further explanation of embodiments of the inventionas claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are included to provide a furtherunderstanding of embodiments of the invention and are incorporated inand constitute a part of this specification, illustrate embodiments ofthe invention and together with the description serve to explain theprinciples of embodiments of the invention.

FIG. 1 is a block diagram illustrating a system for suggesting revisionsto an electronic document, according to some embodiments.

FIG. 2 is a data flow diagram of a document upload process with editsuggestion, according to some embodiments.

FIG. 3 is a process flow chart for editing a SUA and updating a seeddatabase according to some embodiments.

FIG. 4 illustrates an edited document, according to some embodiments.

FIG. 5 is an illustration of a point edit-type alignment according tosome embodiments.

FIG. 6 is an illustration of a point edit-type alignment according tosome embodiments.

FIG. 7 is an illustration of a span edit-type alignment according tosome embodiments.

FIG. 8 is a block diagram illustrating an edit suggestion device,according to some embodiments.

FIG. 9 is a method for suggesting revisions to text data, according tosome embodiments.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Reference will now be made in detail to embodiments of the invention,examples of which are illustrated in the accompanying drawings. Theinvention may, however, be embodied in many different forms and shouldnot be construed as being limited to the embodiments set forth herein;rather, these embodiments are provided so that this disclosure will bethorough and complete, and will fully convey the concept of theinvention to those skilled in the art. In the drawings, the thicknessesof layers and regions are exaggerated for clarity. Like referencenumerals in the drawings denote like elements.

U.S. Pat. No. 10,216,715 contemplates calculating similarity scoresbetween SUAs and original texts of a seed database according to apre-selected similarity metric. Significant research was invested indetermining a single “best” metric for determining whether an originaltext in the seed database was sufficiently similar to the SUA such thatthe original text's corresponding final text could be coherently appliedto the SUA.

In some embodiments, however, there may be no single “best” similaritymetric and instead, the optimal metric may vary depending on, amongother things, the type of edit that was applied to the original text inthe seed database. Thus, according to some embodiments, the “best”similarity metric may be selected in view of the type of edit applied tothe original text in the seed database. Moreover, according to someembodiments, the alignment method used between the SUA, original text,and final text may be optimally selected based on the type of edit.

Generally speaking, an “edit operation” means that between the originaltext and the final text, some text was deleted, replaced, inserted. Theconcept of “type of edit” refers to the type of edit operation that wasperformed on the original text in the seed database to get to the finaltext in the seed database. Non-limiting examples of the “type of edit”can include, for example, a full sentence edit, a parenthetical edit, asingle word edit, a structured list edit, an unstructured list edit, ora fronted constituent edit.

A type of edit can be a “full sentence delete” such as deleting thesentence: “In the event disclosing party brings suit to enforce theterms of this Agreement, the prevailing party is entitled to an award ofits attorneys' fees and costs.”

A type of edit can be a “full sentence replace” such as replacing thesentence “Receipt of payment by the Contractor from the Owner for theSubcontract Work is a condition precedent to payment by the Contractorto the Subcontractor,” with “In no event and regardless of anypaid-if-paid or pay-when-paid contained herein, will Contractor pay theSubcontractor more than 60 days after the Subcontractor completes thework and submits an acceptable payment application.”

A type of edit can be a “full sentence insert,” which can be performedafter a particular sentence, or a sentence having a particular meaning,for example, taking an original sentence “In the event of Recipient'sbreach or threatened breach of this Agreement, Disclosing Party isentitled, in addition to all other remedies available under the law, toseek injunctive relief,” and inserting after the sentence: “In no event;however, will either Party have any liability for special orconsequential damages.”

A type of edit can be a “full sentence insert,” which can be performedwhere an agreement is lacking required specificity, for example byadding “The Contractor shall provide the Subcontractor with the samemonthly updates to the Progress Schedule that the Contractor provides tothe Owner, including all electronic files used to produce the updates tothe Progress Schedule.”

A type of edit can be a “structured list delete”, for example, deleting“(b) Contractor's failure to properly design the Project” from thefollowing structured list: “Subcontractor shall indemnify Contractoragainst all damages caused by the following: (a) Subcontractor's breachof the terms of this Agreement, (b) Contractor's failure to properlydesign the Project, and (c) Subcontractor's lower-tier subcontractor'sfailure to properly perform their work.”

A type of edit can be a “structured list insert” such as the insertionof “(d) information that Recipient independently develops” into astructured list as follows: “Confidential Information shall not include(a) information that is in the public domain prior to disclosure, (b)information that Recipient currently possesses, (c) information thatbecomes available to Recipient through sources other than the DisclosingParty, and (d) information that Recipient independently develops.”

A type of edit can be a “leaf list insert” such as inserting “studies”into the following leaf list: “The ‘Confidential Information,’ includes,without limitation, computer programs, names and expertise of employeesand consultants, know-how, formulas, studies, processes, ideas,inventions (whether patent-able or not) schematics and other technical,business, financial, customer and product development plans, forecasts,strategies and information.”

A type of edit can be a “leaf list delete” such as deleting “attorneys'fees” from the following leaf list: “Subcontractor shall indemnifyContractor against all damages, fines, expenses, attorneys' fees,-costs,and liabilities arising from Subcontractor's breach of this Agreement.”

A type of edit can be a “point delete” such as deleting “immediate” fromthe following sentence: “Recipient will provide immediate notice toDisclosing Party of all improper disclosers of ConfidentialInformation.”

A type of edit can be a “span delete” such as deleting “consistent withthe Project Schedule and in strict accordance with and reasonablyinferable from the Subcontract Documents” from the following text: “TheContractor retains the Subcontractor as an independent contractor, toprovide all labour, materials, tools, machinery, equipment and servicesnecessary or incidental to complete the part of the work which theContractor has contracted with the Owner to provide on the Project asset forth in Exhibit A to this Agreement, consistent with the ProjectSchedule and in strict accordance with and reasonably inferable from theSubcontract Documents.”

A type of edit can be a “point replace” such as replacing “execute” inthe following text with “perform:” “The Subcontractor represents it isfully experienced and qualified to perform the Subcontract Work and itis properly equipped, organized, financed and, if necessary, licensedand/or certified to execute the Subcontract Work.”

A type of edit can be a “point insert” such as inserting “reasonably” asfollows: “The Subcontractor shall use properly-qualified individuals orentities to carry out the Subcontract Work in a safe and reasonablemanner so as to reasonably protect persons and property at the site andadjacent to the site from injury, loss or damage.”

A type of edit can be a “fronted constituent edit” such the insertion of“Prior to execution of the Contract” in the following text: “Prior toexecution of the Contract, Contractor shall provide Subcontractor with acopy of the Project Schedule.”

A type of edit can be an “end of sentence clause insert” such as theinsertion of “except as set forth specifically herein as takingprecedent over the Contractor's Contract with the Owner” as follows: “Inthe event of a conflict between this Agreement and the Contractor'sContract with the Owner, the Contractor's Contract with the Owner shallgovern, except as set forth specifically herein as taking precedent overthe Contractor's Contract with the Owner.”

A type of edit can be a “parenthetical delete” such as deleting theparenthetical “(as evidenced by its written records)” in the followingtext: “The term ‘Confidential Information’ and the restrictions setforth in Clause 2 and Clause 5 of this Schedule ‘B’ shall not apply toinformation which was known by Recipient (as evidenced by its writtenrecords) prior to disclosure hereunder, and is not subject to aconfidentiality obligation or other legal, contractual or fiduciaryobligation to Company or any of its Affiliates.”

A type of edit can be a “parenthetical insert” such as the insertion of“(at Contractor's sole expense” in the following text: “The Contractorshall (at Contractor's sole expense) provide the Subcontractor withcopies of the Subcontract Documents, prior to the execution of theSubcontract Agreement.”

Although many types of edits have been disclosed and described, theinvention is not limited to the specific examples of types of editsprovided and those of skill in the art will appreciate that other typesof edits are possible and therefore fall within the scope of thisinvention.

FIG. 1 is a block diagram illustrating a system for suggesting revisionsto an electronic document 100, according to some embodiments. A userdevice 102, such as a computer, mobile device, tablet, and the like, maybe in communication with one or more application servers 101. In someembodiments, the user device 102 is in communication with applicationserver 101 via a network 120. In some embodiments, network 120 may be alocal area network or a wide area network (e.g., the Internet).

In some embodiments, the system 100 may further include one or more datasources, such a document database 110 (sometimes referred to herein as a“seed database”). The document database 110 may be configured to storeone or more documents, such as, for example, a DUA. In some embodiments,the document database 110 may be referred to as a “seed database.” Asdescribed above, the seed database of past edits may comprise “originaltext” and “final text” representing, respectively, an unedited text andthe corresponding edit thereto.

In some embodiments, the user device 102, document database 110, and/orapplication server 101 may be co-located in the same environment orcomputer network, or in the same device.

In some embodiments, input to application server 101 from client device102 may be provided through a web interface or an applicationprogramming interface (API), and the output from the application server101 may also be served through the web interface or API.

While application server 101 is illustrated in FIG. 1 as a singlecomputer for ease of display, it should be appreciated that theapplication server 101 may be distributed across multiple computersystems. For example, application server 101 may comprise a network ofremote servers and/or data sources hosted on network 120 (e.g., theInternet) that are programmed to perform the processes described herein.Such a network of servers may be referred to as the backend of theclause library system 100.

FIG. 2 is a data flow diagram of a document upload process with editsuggestion, according to some embodiments. As shown in FIG. 2, a usermay upload a previously unseen document, or document under analysis(DUA), 201 to application server 101 using a web interface displayed onuser device 102. In some embodiments, the application server 101 storesthe received DUA 601 in document database 110.

According to some embodiments, the application server 101 may compriseone or more software modules, including edit suggestion library 210 andslot generation library 220.

Edit suggestion library 210 may comprise programming instructions storedin a non-transitory computer readable memory configured to cause aprocessor to suggest edits to the DUA 201. The edit suggestion library210 may perform alignment, edit suggestion, and edit transfer proceduresto, inter alia, determine which sentences in a document should beaccepted, rejected, or edit, and transfers edits into the document. Theapplication server 101 may store the resulting edited document or set ofone or more edits in association with the DUA 201 in document database110. The edit suggestion features are described more fully in connectionwith FIGS. 3-7 and 9, described below.

In embodiments where the application server comprises a slot generationlibrary 220, a user may upload a Typical Clause to application server101 using a web interface displayed on user device 102. In someembodiments, the application server 101 stores the received TypicalClause in a clause library database (not shown in FIG. 2). In someembodiments, slot generation library 220 may comprise programminginstructions stored in a non-transitory computer readable memoryconfigured to cause a processor to implement slot generation features asdescribed more fully in co-pending U.S. application Ser. No. 16/197,769,filed on Nov. 21, 2018, which is a continuation of U.S. application Ser.No. 16/170,628, filed on Oct. 25, 2018, the contents of which areincorporated herein by reference. As a result of these processes, theslot generation library 220 may output a set of one or more slot valuescorresponding to the received DUA. The application server 101 may storesuch slot values in association with the DUA 201 in document database110.

In some embodiments, the slot generation library 220 and the editsuggestion library 210 may be used in combination. For example, the editsuggestion library 210 may benefit when used in conjunction with a slotnormalization process utilizing slot generation library 220 where thesurface form of slot types are replaced with generic terms. Duringalignment, unseen sentence may be aligned with an optimal set oftraining sentences for which the appropriate edit operation is known(e.g., accept, reject, edit). However, during alignment, smalldifferences in sentences can tip the similarity algorithms one way orthe other. By introducing slot normalization to the training data whenit is persisted to the training database, and again to each sentenceunder analysis, the likelihood of alignment may be increased when termsdiffer lexically but not semantically (for instance “Information” vs“Confidential Information”). If an edit is required, the edit transferprocess may use the normalized slots again to improve sub-sentencealignment. The edit transfer process may search for equal spans betweenthe training sentence and the SUA in order to determine where edits canbe made. Slot normalization may increase the length of these spans,thereby improving the edit transfer process. Additionally, suggestededits may be inserted into the DUA 201 with the proper slot value.

The edit suggestion system 100 may comprise some or all of modules 210,220 as depicted in FIG. 2.

FIG. 3 is a process flow chart for editing a SUA and updating a seeddatabase according to some embodiments. In some embodiments, process 300may be performed by edit suggestion system 100 and/or application server101. As shown in FIG. 3, editing an SUA may comprise selecting anoriginal text from the seed database for analysis 310, classifying anedit-type between the selected original text and the corresponding finaltext 311, selecting a similarity metric based on the edit-typeclassification 312, and generating a similarity score 313 between theoriginal text and the SUA. In decision step 314, the process determineswhether additional original texts exist for which a similarity scoreshould be calculated. If “yes”, the process transitions back to step 310where a new original text is selected for analysis. If “no” the processtransitions to step 320.

The process of editing an SUA may further comprise selecting a candidateoriginal text 320, selecting an alignment method based on the edit-typeclassification 330, aligning the SUA with the candidate original textaccording to the selected alignment method 331, determining a set of oneor more edit operations according to the selected alignment method 332,and creating or updating the ESUA 333. In decision step 334, the processdetermines whether there are additional candidate original texts and, ifso, a new candidate is selected 321 and the process transitions back tostep 330, selecting an alignment method based on edit-typeclassification. If there are no more candidates in step 334, the processtransitions to step 340 where the seed database is updated with the SUAand new ESUA. Finally, the ESUA can be substituted into the DUA in placeof the SUA, or the edits may be applied directly to the DUA, in step350.

In greater detail, in step 310, a first original text can be selectedfrom the seed database for comparison against a SUA. In step 311, theselected original text and its corresponding final text can beclassified according to the type of edit that was applied to theoriginal text. The classification of step 311 can occur in real timewhen an original text is selected for analysis. In the alternative, theclassification of step 311 can occur as part of the creation of the seeddatabase. In some embodiments, the classification step 311 may furtherinclude classifying a potential edit type based on the text of the SUAin the case of, for example, a leaf list and structured list edit. Anexample classification procedure is described in further detail belowand in connection with FIG. 4.

In step 312, a similarity metric can be selected based on the type ofedit. For example, the cosine distance algorithm can provide a goodmeasure of similarity between an original text and an SUA for a singleword insert. Thus, for entries in the seed database of a single wordinsert the process can advantageously select the cosine distancealgorithm to determine the degree of similarity between the SUA and theoriginal text. In another example, edit distance can provide a goodmeasure of similarity between an original text and an SUA for a fullsentence delete. Thus, for entries in the seed database of a fullsentence delete, the process can advantageously select edit distance todetermine the degree of similarity between the SUA and the originaltext.

In step 313, a similarity score for the selected original text and theSUA is calculated based on the selected similarity metric for that edittype. In step 314, the process determines if there are additionaloriginal texts to be analyzed for similarity. In the example of a seeddatabase there are typically many original texts to analyze and theprocess loops back to step 310 until all the original texts have beenanalyzed and a similarity score generated.

In some embodiments, a text under analysis (TUA) may be used foralignment, which comprises a window of text from the DUA, which may spanmultiple sentences or paragraphs, where a full edit operation may beperformed. Full edit types may rely on a similarity metric calculatedover a window of text before and/or after the original text and a set ofsuch windows from the DUA. The window from the DUA with the highestscore as compared to the original text's window becomes the text underanalysis (TUA) into which the full edit operation is performed,producing the full edit, which may be the deletion of all or part of theTUA or the insertion of the final text associated with the originaltext. In some embodiments, a window of text is extracted from theoriginal texts' document context. That window is then used to search theDUA for a similar span of text. The original text with the highestsimilarity value, according to one or more similarity metrics (such ascosine distance over TF/IDF, word count, and/or word embeddings forthose pairs of texts), on the window of text may be selected.

In some embodiments, once a span edit, such as the deletion of aparenthetical or other short string longer than a single word, isdetected, the best original text from among the set of aligned originaltexts may be selected. A Word Mover Distance similarity metric may beused to compare the deleted span with spans in the TUA and the originaltext with the nearest match to a span in the TUA is selected. Thisallows semantically similar but different spans to be aligned forediting. In some embodiments, span edits may rely on a Word Embeddingbased similarity metric to align semantically related text spans forediting. The relevant span of the original text is compared to spans ofthe TUA such that semantically similar spans are aligned where the editoperation could be performed.

In step 320, a candidate original text can be selected. The candidatecan be selected based on the similarity score calculated in step 313.There can be multiple candidate original texts. For example, in step320, the original text having the highest similarity score, or anoriginal text exceeding some threshold similarity score, or one of theoriginal texts having the top three similarity scores may be selected.Selecting a candidate original text in this step 320 may consider otherfactors in addition to the similarity score such as attributes of thestatement under analysis. In any event, each original text that meetsthe selection criteria can be considered a candidate original text.

In step 330, an alignment method can be selected based on the edit-typeclassification for the selected candidate original text. Improvedalignment between the SUA, original text, and final text can be achievedwhen the alignment method is selected based on the edit-typeclassification rather than employing a single alignment method for allalignments. For example, a longest-matching substring can provide a goodalignment between an original text and an SUA for a single word insert.Thus, for entries in the seed database of a single word insert, theprocess can advantageously select longest matching substring to alignthe SUA and the original text. In another example, a constituent-subtreealignment can provide a good alignment between an original text and anSUA for a structured-list insert. Thus, for entries in the seed databaseof structured-list insert the process can advantageously select aconstituent-subtree alignment to align the SUA and the original text.Additional alignment methods are described in further detail below.

In step 331 the SUA and the candidate original text are alignedaccording to the alignment method selected in step 330. In step 332, aset of one or more edit operations is determined according to thealignment method selected in step 330. In some embodiments, the set ofone or more edit operations may be determined by aligning the candidateoriginal text with its associated final text according to the alignmentmethod selected in step 330, and determining a set of one or more editoperations that convert the aligned original text to the aligned finaltext. In such embodiments, in step 333 the SUA is created by applyingthe set of one or more edit operations.

In some embodiments, in step 332, the set of one or more edit operationsmay be determined by determining a set of edit operations that convertthe SUA to the final text associated with the original text. In suchembodiments, in step 333 the SUA is created by applying to the SUA oneor more edit operations from the set of one or more edit operationsaccording to the alignment method.

Step 334 can be consistent with multiple alignment, that is, where a SUAis aligned and is edited in accordance with multiple original/finaltexts from the seed database. In step 334, it can be determined whetherthere are additional candidate original texts that meet the selectioncriteria (e.g. exceed a similarity score threshold, top three, etc). If“yes” the process proceeds to step 321 where a new candidate originaltext is selected. If no, the process can proceed to step 340.

In step 340, the seed database can be updated with the SUA and the ESUAwhich, after adding to the seed database would be considered an“original text” and a “final text,” respectively. In this way, themethods disclosed herein can learn from new DUAs and new SUAs by addingto its seed database.

In some embodiments, there may also be a step between 334 and 340 wherea human user reviews the proposed ESUA of the EDUA to (a)accept/reject/revise the proposed revisions or (b) include additionalrevisions. This feedback may be used to improve the similarity scoremetrics (e.g., by training the system to identify similar or dissimilarcandidate original texts) and/or the suggested edit revision process(e.g., by training the system to accept or reject certain candidatealignments) for specific user(s) of the system 100.

In step 350 the ESUA can be recorded back into the DUA in place of theSUA, or the edit can be applied to the text of the DUA directly.

Training Data Creation

It is contemplated that potential users of the invention may not have alarge database of previously edited documents from which to generate theseed database. To address this limitation, embodiments of the inventioninclude generating a seed database from documents provided by a thirdparty or from answering a questionnaire. For example, if a user is aproperty management company that does not have a sufficient base ofpreviously edited documents from which to generate a seed database,embodiments of the invention may include sample documents associatedwith other property management companies or publicly available documents(e.g. from EDGAR) that can be used to populate the seed database.

In another example, if a user does not have a sufficient base ofpreviously edited documents from which to generate a seed database,embodiments of the invention may ask legal questions to the user todetermine a user's tolerance for certain contractual provisions. Ingreater detail, during a setup of the invention, the user may be asked,among other things, whether they will agree to “fee shifting” provisionswhere costs and attorneys' fees are borne by the non-prevailing party.If yes, the invention can populate the seed database with original/finaltexts consistent with “fee shifting,” e.g., the original and final textscontain the same fee shifting language. If not, the invention canpopulate the seed database with original/final texts consistent with no“fee shifting,” e.g., the original text contains fee shifting languageand the final text does not contain fee shifting language.

FIG. 4 illustrates an edited document, according to some embodiments. Asshown in FIG. 4, edited document 400 may comprise an Open DocumentFormat (ODT) or Office Open XML (OOXML) type document with tagsrepresenting portions of the original document that have been revised byan editor. In some embodiments, the tags may comprise “Track-Changes”tags as used by certain document editing platforms.

As shown FIG. 4, edited document 400 may comprise a plurality ofclassified edits, such as a point edit (401); a chunk delete (403); alist item insert (405); a leaf list insert (407); a full sentence delete(409); and a paragraph insert (411). Additional edits not shown inedited document 400 may comprise, e.g., a span edit and a full sentenceinsert.

Edit Suggestion System 100 may ingest a document 400 by traversing itsruns in order. In some embodiments, a “run” may refer to the run elementdefined in the Open XML File Format. Every run may be ingested and addedto a string representing the document in both its old (original) and new(edited/final) states. The system 100 may note, for each subsequencereflecting each run, whether each subsequence appears in the old and newstates. A subsequence may comprise, for example, an entire document,paragraphs, lists, paragraph headers, list markers, sentences,sub-sentence chunks and the like. This list is non-exhaustive, and aperson of ordinary skill in the art may recognize that additionalsequences of text, or structural elements of text documents, may beimportant to capture.

A set of strings may be assembled from each subsequence, where onestring in the set reflects an old state (e.g., original text) and asecond string in the set reflects a new state (e.g., final or editedtext). In some embodiments, each string is processed to identifylinguistic features, such as word boundaries, parts of speech, listmarkets, list items, paragraph/clause headers, and sentence/chunkboundaries. In some embodiments, the system requires identification ofsentence boundaries for alignments. However, the system may determinethese linguistic features statistically; as a result, small changes inthe data can result in big changes in the boundaries output. Therefore,it may be necessary to create a merger of all sentences where, givenoverlapping but mismatched spans of text, spans representing the largestsequences of overlap are retained.

Once this merger of all sentences has been determined, the set of mergedsentences may be used to identify whether one or more edit types haveoccurred. Such edit types may include, for example, a full edit (e.g.,sentence or paragraph), list edit (structured or leaf list), chunk edit,point edit, or span edit, among others.

In some embodiments, in order to identify full paragraph edits, thesystem first determines, for strings corresponding to a paragraph indocument 400, whether there are characters in both the old and newstates. If the old state has no characters and the new state does, thatis a full paragraph insert (FPI); if the new state has no characters andthe old state does, that is a full paragraph delete (FPD).

In some embodiments, in order to identify full sentence edits, for eachsentence or special sentence in a paragraph, the system attempts to paireach sentence in each state (e.g., original) with a sentence in theother state (e.g., final). If the pairing succeeds, then no full changeoccurred. If the pairing fails for a sentence in the old state (e.g.,original), the sentence is tagged as a full sentence delete (FSD); ifthe pairing fails for a sentence in the new state (e.g., final), thesentence is tagged as a full sentence insert (FSI).

In some embodiments, in order to identify full chunk edits, for eachsentence or special sentence in a paragraph, the system attempts to paireach constituent in each state (e.g., original) with a chunk in theother state (e.g., final). If the pairing succeeds, then no full changeoccurred. If the pairing fails for a chunk in the old state (e.g.,original), the chunk is tagged as a full chunk delete (FCD); if thepairing fails for a chunk in the new state (e.g., final), the chunk istagged as a full chunk insert (FCI).

In some embodiments, in order to identify structured list edits, thesystem attempts to pair list items in a structured list in each state(e.g., original) with a list item in the other state (e.g., final). Ifthe pairing succeeds, then no structured list edit occurred. If thepairing fails for a list item in the old state (e.g., original), thelist item is tagged as an List Item Delete; if the pairing fails for alist item in the new state (e.g., final), the list item is tagged as aList Item Insert.

In some embodiments, if the new state (e.g., original) and the old state(e.g., final) are equal, then the string of text is labeled as an“accept.”

In some embodiments, if the new state and the old state are not equal,but the change is not a “Full Edit” (e.g., FPD, FPI, FSD, or FSI), thestring of text is labeled as a “revise.” Revises may be labeled aseither “Point Edits” or “Span Edits.” Point Edits are insertions, singleword replaces, and single word deletes. Span Edits are multi worddeletes and multi word replaces. In some embodiments, a revise may belabelled as a “Full Edit” (e.g., FPD, FPI, FSD, or FSI).

In some embodiments, unstructured, syntactically coordinated naturallanguage lists are identified with a regular pattern of part-of-speechtags, sentence classifications, and other features that are indicativeof a list, manually tuned to fit such sequences.

For example, one embodiment of such a pattern may be: D?N+((N+),)*CN+;where D represents a token tagged as a determiner, N represents a tokentagged as a noun, C represents a token tagged as a conjunction, and “,”represents comma tokens. Sequences that would match such a patterninclude, for example: (i) any investor, broker, or agent; (ii) investor,broker, or agent; (iii) investor, stock broker, or agent; and (iv) allbrokers or agents.

In some embodiments, additional information may be captured as part ofthe training process. For example, text classification (e.g., feeshifting; indemnification; disclosure required by law) may assist withaugmenting the training data. The additional information may assist withcreating a seed database through a question and answer system. Anotherexample may include identifying choice of law SUA(s), and thenidentifying the jurisdictions or states within those provision (e.g.,New York, Delaware), which may help with a question and answer learningrule such as always change the choice of law to New York. Anotherexample may include classifying “term” clauses and durations in suchclauses in order to learn rules about preferred durations.

Point Edit Type Alignment

FIG. 5 is an illustration of a point edit-type alignment according tosome embodiments. As shown in FIG. 5, the statement under analysis (SUA510) is matched with a candidate original text (OT1 520) based on asimilarity score as described above. As highlighted in box 505, there isa point edit type between the original text (OT1 520) and the final text(FT1 530) because of the insertion of the word “material” into the finaltext (FT1 530). Accordingly, an alignment method applicable for a pointedit may be selected as shown in FIG. 5.

In some embodiments, the selected alignment may comprise aligning theSUA 510 to the original text “OT1” 520, aligning a corresponding finaltext “FT1” 530 to the original text 520, determining one or more editoperations to transform the original text “OT1” 520 into the final text“FT1” 530 according to the alignment (e.g., insertion of the word“material”), and creating the ESUA 540 by applying the one or more editoperations to the statement under analysis “SUA” 520.

In other embodiments, the selected alignment may comprise aligning theSUA 510 to the original text “OT1” 520, obtaining a corresponding finaltext “FT1” 530, determining a set of one or more edit operations totransform the SUA 510 into the FT1 530, and applying to the SUA 510 theone or more edit operations consistent with the first alignment (e.g.,insertion of the word “material”).

These alignment techniques are disclosed more fully in U.S. applicationSer. No. 15/227,093 filed Aug. 3, 2016, which issued as U.S. Pat. No.10,216,715, and U.S. application Ser. No. 16/197,769, filed on Nov. 21,2018, which is a continuation of U.S. application Ser. No. 16/170,628,filed on Oct. 25, 2018, which are hereby incorporated by reference intheir entirety.

Semantic Alignment

FIG. 6 is an illustration of a point edit-type alignment according tosome embodiments. In some embodiments, the alignment proceduresdescribed above in connection with FIG. 5 and elsewhere herein do notrequire exact overlaps. For example, FIG. 6 illustrates SUA 610, whichis nearly identical to SUA 510 in FIG. 5 except for the substitution ofthe word “defect” for “deformity.”

According to some embodiments, the training data is augmented togenerate additional instances of sentences that are changed to use,e.g., paraphrases of words and phrases in the training sentence.Additional features of the training sentences may be extracted fromdocument context and used to enhance alignment and support differentedit types. Example features may include word embeddings for sentencetokens, user, counterparty, edit type, and edit context (e.g., nearbywords/phrases). Augmentation of the training data in this manner mayallow the system to perform semantic subsentence alignment, e.g., byenabling sub-sentence similarity tests to consider semantic similaritybased on word embeddings.

Semantic subsentence alignment may enable the point edit type alignmentprocedure as disclosed above in connection with FIG. 5 to work whenexact overlaps are not available—for example, ‘defects’ vs ‘deformity’as shown in FIG. 6. Referring to FIG. 6, the statement under analysis(SUA 610) may be matched with the same candidate original text (OT1 520)based on a similarity score as described above. As highlighted in box505, there is a point edit type between the original text (OT1 520) andthe final text (FT1 530) because of the insertion of the word “material”into the final text (FT1 530). In view of the point edit type 505, thesystem may proceed with performing the point edit type alignmentprocedure described above in connection with FIG. 5 in addition tosemantic subsentence alignment. For example, using semantic subsentencealignment, the system is able to align “deformity” recited in SUA 610with “defects” recited in OT1 520, as indicated by the arrows, andrecognize the point edit operation of inserting the term “material” intothe ESUA 640.

Span Edit Type Alignment

In some embodiments, span delete edit types might not require analignment of the text the surrounds the deleted text. For example, TableA below depicts an example where a SUA has a high similarity score witha four different original texts because of the inclusion of the clause“as established by documentary evidence.” Each original text has a“SPAN” edit type operation as reflected by the deletion of the “asestablished by documentary evidence” between each Original Text and itsrespective Final Text. In this example, and as shown in FIG. 7, analignment of the text surrounding the deleted phrase is unnecessary.

TABLE A SUA Original Text Final Text Edit Op. ESUA (b) . . . available(b) Such (b) Such Proprietary SPAN (b) . . . available to to theRecipient Proprietary Information is the Recipient on a on a non-Information is already in the non-confidential confidential already inthe possession of the basis from a third- basis from a possession ofReceiving Party or party source third-party the Receiving itsrepresentatives provided that such source, as Party or its withoutrestrict and third party is not . . . established by representatives,prior to any documentary as established disclosure hereunder evidence,by provided that documentary such third party evidence, is not . . .without restrict and prior to any disclosure hereunder (b) . . .available d. is, as established d. is-independently SPAN (b) . . .available to to the Recipient by documentary developed by the theRecipient on a on a non- evidence, Receiving Party. non-confidentialconfidential independently basis from a third- basis from a developed bythe party source third-party Receiving Party. provided that such source,as third party is not . . . established by documentary evidence,provided that such third party is not . . . (b) . . . available (iii)was already in (iii) was already in SPAN (b) . . . available to to theRecipient the possession of the possession of the the Recipient on a ona non- the Recipient or its Recipient or its non-confidentialconfidential Representatives, as Representatives-on a basis from athird- basis from a established by non-confidential party sourcethird-party documentary basis from a source provided that such source,as evidence, on a non- other than the third party is not . . .established by confidential basis Disclosing Parties documentary from asource other prior to the date evidence, than the Disclosing hereofprovided that Parties prior to the such third party date hereof is not .. . (b) . . . available (c) was lawfully (c) was lawfully SPAN (b) . . .available to the Recipient acquired by the acquired by the to theRecipient on a non- Recipient from a Recipient from a on a non-confidential third party, as third party-and not confidential basis froma established by subject to any basis from a third-party documentaryobligation of third-party source, as evidence, and not confidence to thesource-provided established by subject to any party furnishing the thatsuch third documentary obligation of Confidential party is not . . .evidence, confidence to the Information. provided that party furnishingthe such third party Confidential is not . . . Information.

FIG. 7 is an illustration of a span edit-type alignment according tosome embodiments. As shown by the arrows in FIG. 7, an alignment of thetext surrounding the deleted phrase “as established by documentaryevidence” is not necessary. Namely, where the SUA (710) and an OT1 (720)are above a certain similarity threshold, and the SUA (710) contains thesame text as the OT1 (720) that was deleted (or replaced) to arrive atthe FT1 (730), the same text present in the SUA (710) may be deleted toarrive at the ESUA (740). For example, as shown in FIG. 7, since thereis the same text “, as established by documentary evidence,” in SUA(710) and OT1 (720), and there is a span delete edit type between OT1(720) and FT1 (730) for that same text, then the system arrives at theESUA (740) by deleting the same text from SUA (710).

In some embodiments, the training data augmentation process describedabove may also be used to enhance alignment and support span edits. Forexample, semantic subsentence alignment may enable the span edit typealignment procedure as disclosed above in connection with FIG. 7 to workwhen exact overlaps are not available.

According to some embodiments, span edits may rely heavily on twofactors: (1) sentence or paragraph context, and (2) edit frequency. Aspart of the alignment process, the system may first extract candidateoriginal text matches against a SUA as described above, and thecandidate original text may indicate that a span edit is required basedon the associated final candidate text. Next, the system may clusterspan edits across all available training data (e.g., original and finaltexts) to find a best match for the SUA's context.

In some embodiments, the system may choose from the cluster the bestspan edit to make in this context. The selection may be based on somecombination of context (words nearby) and frequency of the edit itself(e.g. how often has the user deleted a parenthetical that has highsimilarity to the one in the selected original text, within this contextand/or across contexts). In some embodiments, if the selection is notthe same as the best matching (similar) original text, the system mayreplace that selection with an original text with a higher similarityscore.

Once the candidate original text is selected, the system may apply theedit using the alignment procedures described herein. An example of thesemantic alignment as applied for a span delete is shown below in TableB.

TABLE B SUA Original Text Final Text Edit Op. ESUA (b) . . . available(iv) is (iv) is independently SPAN (b) . . . available to to theRecipient independently developed by the the Recipient on a on a non-developed by the receiving party non-confidential confidential receivingparty without reference to basis from a third- basis from a withoutreference to the Confidential party source third-party the Confidentialinformation of the provided that such source, as information of theother party. third party is not . . . established by other party, whichdocumentary can be evidence, demonstrated by provided that writtenrecord. such third party is not . . . (b) . . . available (iii) wasalready in (iii) was already in SPAN (b) . . . available to to theRecipient the possession of the possession of the the Recipient on a ona non- the Recipient or its Recipient or its non-confidentialconfidential Representatives (as Representatives on a basis from athird- basis from a demonstrated by non-confidential party sourcethird-party written records) on basis from a source provided that suchsource, as a non-confidential other than the third party is not . . .established by basis from a source Disclosing Parties documentary otherthan the prior to the date evidence, Disclosing Parties hereof . . .provided that prior to the date such third party hereof . . . is not . .. (b) . . . available (c) was lawfully (c) was lawfully SPAN (b) . . .available to to the Recipient acquired by the acquired by the theRecipient on a on a non- Recipient from a Recipient from anon-confidential confidential third party (as third party and not basisfrom a third- basis from a evidenced in the subject to any party sourcethird-party Recipient's written obligation of provided that such source,as records) and not confidence to the third party is not . . .established by subject to any party furnishing the documentaryobligation of Confidential evidence, confidence to the Information.provided that party furnishing the such third party Confidential is not. . . Information.

Full Edit Type Alignment

In some embodiments where the edit type comprises a full sentence insert(FSI), an alignment method may be selected based on the FSI edit type.Each SUA is compared to semantically similar original texts. If one ofthe original texts is labeled with an FSI edit operation, then that sameFSI edit operation that was applied to the original text is applied tothe SUA. An example of this alignment method for FSI edit operations isshown in Table C, below.

TABLE C SUA Original Text Final Text Edit Op. ESUA Therefore, the Anyrelief is in Any relief is in FSI Therefore, the Receiving Partyaddition to and not addition to and not Receiving Party agrees that thein replace of any in replace of any agrees that the Disclosing Partyappropriate relief in appropriate relief in Disclosing Party shall beentitled the way of the way of monetary shall be entitled to to seekmonetary damages. damages. Neither seek injunctive injunctive Partyshall be liable and/or other and/or other for consequential equitablerelief, in equitable relief, damages. addition to any other in additionto remedies available at any other law or equity to the remediesDisclosing Party. available at law Neither Party shall or equity to thebe liable for Disclosing consequential Party. damages. Therefore, theTherefore, the Therefore, the FSI Therefore, the Receiving PartyDisclosing Party Disclosing Party Receiving Party agrees that the shallbe entitled to shall be entitled to agrees that the Disclosing Partyseek equitable or seek equitable or Disclosing Party shall be entitledinjunctive relief, in injunctive relief, in shall be entitled to to seekaddition to other addition to other seek injunctive injunctive remediesto which remedies to which it and/or other and/or other it may beentitled at may be entitled at equitable relief, in equitable relief,law or equity. law or equity. addition to any in addition toNotwithstanding the other remedies any other foregoing, neitheravailable at law or remedies Party shall be liable equity to theavailable at law for consequential Disclosing Party. or equity to thedamages. Neither Party shall Disclosing be liable for Party.consequential damages. Therefore, the Such remedies shall Such remediesshall FSI Therefore, the Receiving Party not be deemed to be not bedeemed to be Receiving Party agrees that the the exclusive the exclusiveagrees that the Disclosing Party remedies for breach remedies for breachDisclosing Party shall be entitled of this Agreement, of this Agreement,shall be entitled to to seek but shall be in but shall be in seekinjunctive injunctive addition to all other addition to all other and/orother and/or other remedies available remedies available at equitablerelief, in equitable relief, at law or in equity. law or in equity.addition to any in addition to Neither Party shall other remedies anyother be liable for available at law or remedies consequential equity tothe available at law damages. Disclosing Party. or equity to the NeitherParty shall Disclosing be liable for Party. consequential damages.

In some embodiments, if a single SUA triggers multiple FSI(s),semantically similar FSI(s) may be clustered together so that multipleFSIs aren't applied to the same SUA.

In some embodiments, the text of the paragraph/document/etc. can also besearched for semantically similar text to the FSI in order to ensurethat the FSI isn't already in the DUA. A similar process can be used forfull paragraph insertions and list editing. For example, where there isa full paragraph insertion edit operation indicated by the selectedcandidate original text, the system may check to make sure that theparagraph (or the context of the inserted paragraph) is not already inthe DUA.

FSI may be added to the DUA in a location different from the SUA thattriggered the FSI. In some embodiments, when an original text is an FSIand is selected as matching to the SUA, all similar FSI are alsoretrieved from the seed database. The document context is thenconsidered to determine if any of that set of FSI's original texts arepreferred, by frequency, over the SUA that triggered the FSI. If this isthe case, and that original text or significantly similar text, occursin the DUA, the FSI is placed after that new SUA, rather than thetriggering SUA.

In some embodiments, another alignment method may be chosen where theedit type is a full sentence delete (FSD). Each SUA may be compared tosemantically similar original texts. If one of the original texts islabeled with an FSD edit operation, then that same FSD edit operationthat was applied to the original text is applied to the SUA. This sameprocess can be done at the sentence, chunk, paragraph, etc. level, andan example of this alignment method for a FSD edit operation is shown inTable D below.

TABLE D SUA Original Text Final Text Edit Op. ESUA If either If eitherparty FSD Disclosing Party employs attorneys or Receiving to enforce anyParty employs rights arising out of legal counsel to or relating to thisenforce any Agreement, the rights arising prevailing party out of orshall be entitled to relating to this recover reasonable Agreement, theattorneys' fees and prevailing party expenses. shall be entitled torecover reasonable attorney's fees and costs. If either The prevailingFSD Disclosing Party Party in any action or Receiving to enforce thisParty employs Agreement shall be legal counsel to entitled to costs andenforce any attorneys' fees. rights arising out of or relating to thisAgreement, the prevailing party shall be entitled to recover reasonableattorney's fees and costs. If either The prevailing FSD Disclosing PartyParty in any action or Receiving to enforce this Party employs Agreementshall be legal counsel to entitled to all costs, enforce any expensesand rights arising reasonable out of or attorneys' fees relating to thisincurred in bringing Agreement, the such action. prevailing party shallbe entitled to recover reasonable attorney's fees and costs. If eitherCompany agrees to FSD Disclosing Party reimburse or Receiving DisclosingParty Party employs and its legal counsel to Representatives for enforceany all costs and rights arising expenses, including out of orreasonable relating to this attorneys' fees, Agreement, the incurred bythem in prevailing party enforcing the terms shall be entitled of thisAgreement. to recover reasonable attorney's fees and costs.

In some embodiments where there is a full paragraph edit type, analignment method may be selected based on the full paragraph edit type.For example, in the case of a full paragraph insert, the system maycluster typically inserted paragraphs from training data/original textsaccording to textual similarity. The system may then select the mostappropriate paragraph from the training data clusters by aligningparagraph features with the features of the DUA. Paragraph features mayinclude information about the document that the paragraph was extractedfrom originally, such as, for example: counterparty, location in thedocument, document v. document similarity, nearby paragraphs, etc. Insome embodiments, the system may further perform a presence check forthe presence of the selected paragraph or highly similar paragraphs ortext in the DUA. In some embodiments, the system may insert a paragraphusing paragraph features in order to locate the optimal insertionlocation.

In some embodiments, another alignment method may be chosen where theedit type is a full paragraph delete (FPD). Each SUA may be compared tosemantically similar original texts. If one of the original texts islabeled with an FPD edit operation, then that same FPD edit operationthat was applied to the original text is applied to the SUA.

An example of this alignment method for a FPD edit operation is shown inTable E below.

TABLE E SUA Original Text Final Text Edit Op. ESUA Each party 11.Because an FPD recognizes that award of money nothing in this damageswould be Agreement is inadequate for any intended to limit breach ofthis any remedy of Agreement by the the other party. Receiving Party, Inaddition, the Receiving Party each party agrees that in the agrees thata event of any breach violation of this of this Agreement, Agreement theDisclosing could cause the Party shall also be other party entitled toequitable irreparable harm relief. Such and that any remedies shall notremedy at law be the exclusive may be remedies for any inadequate,breach of this Therefore, each Agreement, but party agrees that shall bein addition the other party to all other shall have the remediesavailable right to an order at law or equity. restraining any breach ofthis Agreement and for any other relief the non- breaching party deemsappropriate. Each party 5 Remedies. The FPD recognizes that Companynothing in this acknowledges that Agreement is damages would notintended to limit be an adequate any remedy of remedy and that the theother party. Seller and the In addition, Target would be each partyirreparably harmed agrees that a if any of the violation of thisprovisions of this Agreement letter agreement are could cause the notperformed other party strictly in irreparable harm accordance with andthat any their specific terms remedy at law or are otherwise may bebreached. inadequate. Accordingly, you Therefore, each agree that eachof party agrees that the Seller and the the other party Target isentitled, shall have the individually or right to an order together, torestraining any injunctive relief (or breach of this a similar remedy)to Agreement and prevent breaches of for any other this letter reliefthe non- agreement and to breaching party specifically enforce deems itsprovisions in appropriate. addition to any other remedy available to itat law or in equity. Each party Section 11. The FPD recognizes thatReceiving Party nothing in this acknowledges that Agreement is theConfidential intended to limit Information is a any remedy of valuableasset of the other party. the Disclosing In addition, Party. The eachparty Receiving Party agrees that a further violation of thisacknowledges that Agreement the Disclosing could cause the Party shallincur other party irreparable damage irreparable harm if the Receivingand that any Party should breach remedy at law any of the may beprovisions of this inadequate. Agreement. Therefore, each Accordingly,if the party agrees that Receiving Party the other party breaches any ofthe shall have the provisions of this right to an order Agreement, therestraining any Disclosing party breach of this shall be entitled,Agreement and without prejudice, for any other to all the rights, reliefthe non- damages and breaching party remedies available deems to it,including an appropriate. injunction restraining any breach of theprovisions of this Agreement by the Receiving Party or its agents orrepresentatives.

List Edit Type Alignment

In some embodiments where the edit type comprises a list edit type, analignment method may be selected based on the list edit type.

As used herein, a leaf list may refer to an unstructured ornon-enumerated list. One example of a leaf list is a list of nounsseparated by a comma. In embodiments where there is a leaf list insert(LLI), the alignment method may comprise identifying a leaf list in theDUA, and tokenizing the leaf list into its constituent list items. Theidentified leaf list in the DUA is then compared to similar leaf listsin the training data of original texts. If a list item (e.g., in thecase in table F below, “investor”) is being inserted in the originaltext, and the list item is not already an item in the leaf list in theDUA, then the list item is inserted in the leaf list in the DUA. Anexample of this alignment method for a LLI edit operation is shown inTable F below.

TABLE F SUA Original Text Final Text Edit Op. ESUA “Representatives”“Representative” “Representative” LLI “Representatives” means directors,means the means the means directors, officers, employees, directors,directors, officers, employees, leaders, agents, officers, officers,leaders, agents, financial advisors, employees, employees, financialadvisors, consultants, investment investment investors, contractors,attorneys bankers, rating bankers, consultants, and accountants of aagencies, investors, rating contractors, Party or its Affiliate.consultants, agencies, attorneys and counsel, and consultants,accountants of a other counsel, and Party or its Affiliate.representatives of other ADP or the representatives of Partner, as ADPor the applicable. Partner, as applicable. “Representatives”“Representatives” “Representatives” LLI “Representatives” meansdirectors, means the means the means directors, officers, employees,advisors, agents, advisors, agents, officers, employees, leaders,agents, consultants, consultants, leaders, agents, financial advisors,directors, directors, financial advisors, consultants, officers,officers, investors, contractors, attorneys employees and employees andconsultants, and accountants of a other other contractors, Party or itsAffiliate. representatives, representatives, attorneys and includingincluding accountants of a accountants, accountants, Party or itsauditors, auditors, Affiliate. financial investors, advisors, lendersfinancial and lawyers of a advisors, lenders Party. and lawyers of aParty. “Representatives” “Representatives” “Representatives” LLI“Representatives” means directors, shall refer to all shall refer to allmeans directors, officers, employees, of each of each respectiveofficers, employees, leaders, agents, respective Party's Party'spartners, leaders, agents, financial advisors, partners, officers,officers, financial advisors, consultants, directors, directors,investors, contractors, attorneys shareholders, shareholders,consultants, and accountants of a employees, employees, contractors,Party or its Affiliate, members, members, attorneys and accountants,accountants, accountants of a attorneys, investors, Party or itsindependent attorneys, Affiliate. contractors, independent temporarycontractors, employees, temporary agents or any employees, other agentsor any representatives or other persons that may representatives or fromtime to time persons that may be employed, from time to time retainedby, be employed, working for, or retained by, acting on behalf workingfor, or of, such Party. acting on behalf of, such Party.“Representatives” “Representatives,” “Representatives,” LLI“Representatives” means directors, with respect to with respect to meansdirectors, officers, employees, a party hereto a party hereto officers,employees, leaders, agents, means the means the leaders, agents,financial advisors, directors, directors, financial advisors,consultants, officers, officers, investors, contractors, attorneysemployees, employees, consultants, and accountants of a advisors,advisors, contractors, Party or its Affiliate. consultants, consultants,attorneys and bankers bankers accountants of a (investment and(investment and Party or its commercial), commercial), Affiliate.lawyers, investors, engineers, lawyers, landmen, engineers, geologists,landmen, geophysicists and geologists, accountants, of geophysicists andsuch party hereto accountants, of or any Affiliate such party hereto ofsuch party or any Affiliate hereto. of such party hereto.

As another example, in embodiments where there is a leaf list deletion(LLD), the alignment method may comprise identifying a leaf list in theDUA and tokenizing the leaf list into its constituent list items. Theidentified leaf list in the DUA is then compared to similar leaf listsin the training data of original texts. If a list item (e.g., in thecase in table G below, “employees”) is being deleted from the originaltext, and the list item is already an item in the leaf list in the DUA,then the list item is deleted in the leaf list in the DUA.

An example of this alignment method for a LLD edit operation is shown inTable G below.

TABLE G SUA Original Text Final Text Edit Op. ESUA “Representatives”“Representative” “Representative” LLD “Representatives” means means thedirectors, means the directors, means directors, directors, officers,employees, officers, investment officers, leaders, officers, investmentbankers, bankers, rating agents, financial employees, rating agencies,agencies, advisors, leaders, agents, consultants, consultants, counsel,consultants, financial counsel, and other and other contractors,advisors, representatives of representatives of attorneys andconsultants, ADP or the Partner, ADP or the Partner, accountants of acontractors, as applicable, as applicable. Party or its Affiliate.attorneys and accountants of a Party or its Affiliate. “Representatives”“Representatives” “Representatives” LLD “Representatives” means meansthe advisors, means the advisors, means directors, directors, agents,consultants, agents, consultants, officers, leaders, officers,directors, officers, directors, officers, agents, financial employees,employees and and other advisors, leaders, agents, otherrepresentatives, consultants, financial representatives, includingcontractors, advisors, including accountants, attorneys and consultants,accountants, auditors, financial accountants of a contractors, auditors,financial advisors, lenders and Party or its attorneys and advisors,lenders lawyers of a Party. Affiliate. accountants of a and lawyers of aParty or its Party. Affiliate. “Representatives” “Representatives”“Representatives” LLD “Representatives” means shall refer to all ofshall refer to all of means directors, directors, each respective eachrespective officers, leaders, officers, Party's partners, Party'spartners, agents, financial employees, officers, directors, officers,directors, advisors, leaders, agents, shareholders, shareholders,consultants, financial employees, members, contractors, advisors,members, accountants, attorneys and consultants, accountants, attorneys,accountants of a contractors, attorneys, independent Party or itsattorneys and independent contractors, Affiliate. accountants of acontractors, temporary Party or its temporary employees, agents orAffiliate. employees, agents any other or any other representatives orrepresentatives or persons that may persons that may from time to timebe from time to time employed, retained be employed, by, working for, orretained by, acting on behalf of, working for, or such Party. acting onbehalf of, such Party. “Representatives” “Representatives,”“Representatives,” LLD “Representatives” means with respect to a withrespect to a means directors, directors, party hereto means party heretomeans officers, leaders, officers, the directors, the directors, agents,financial employees, officers, employees, officers, advisors, advisors,leaders, agents, advisors, consultants, bankers consultants, financialconsultants, (investment and contractors, advisors, bankers (investmentcommercial), attorneys and consultants, and commercial), lawyers,engineers, accountants of a contractors, lawyers, engineers, landmen,geologists, Party or its attorneys and landmen, geophysicists andAffiliate. accountants of a geologists, accountants, of such Party orits geophysicists and party hereto or any Affiliate. accountants, ofAffiliate of such such party hereto or party hereto. any Affiliate ofsuch party hereto.

As used herein, a “structured list” may refer to a structured orenumerated list. For example, a structured list may comprise a set oflist items separated by bullet points, numbers ((i), (ii), (iii) . . .), letters ((a), (b), (c) . . . ), and the like. In some embodimentswhere the edit type comprises a structured list insert (SLI), analignment method may be selected based on the SLI edit type. Accordingto the alignment method, each SUA comprising a structured list iscompared to semantically similar original texts comprising a structuredlist. The aligning may further comprise tokenizing the structured listsin the SUA and the original text into their constituent list items. Ifone of the original texts is labeled with an LII edit operation, thenthe system determines the best location for insertion of the list itemand the list item is inserted in the SUA to arrive at an ESUA. In someembodiments, the best location for insertion may be chosen by puttingthe inserted item next to the item already in the list it is mostfrequently collocated with. In other embodiments, the base location forinsertion may be based on weights between nodes in a Markov chain modelof the list or other graphical model of the sequence. In someembodiments, if a single SUA triggers multiple LIIs, semanticallysimilar LIIs may be clustered together so that multiple semanticallysimilar LIIs are not applied to the same SUA.

An example of this alignment method for a SLI edit operation is shown inTable H below.

TABLE H SUA Original Text Final Text Edit Op. ESUA (a) in the public 4.1prior to its 4.1 prior to its SLI (a) in the public domain at thedisclosure was disclosure was domain at the time time of receiptproperly in properly in of receipt by the by the Receiving Party'sReceiving Party's Receiving Party Receiving Party possession; or 4.2 ispossession; or 4.2 is through no breach of through no in the public inthe public domain this Agreement by breach of this domain through nothrough no fault of the Receiving Party; Agreement by fault of the theReceiving party; (b) independently the Receiving Receiving party; or or4.3 independently developed by or for Party; (b) 4.3 was lawfullydeveloped by or for the Receiving Party; lawfully known to the theReceiving Party; (c) lawfully received received by the Receiving Partyor 4.4 was lawfully by the Receiving Receiving Party prior todisclosure; known to the Party from a third from a third or 4.4 islawfully Receiving Party party; or (d) known party; or (c) madeavailable to prior to disclosure; by the Receiving known by the theReceiving Party or 4.5 is lawfully Party at the time of Receiving Partyby a third party made available to receipt. at the time of entitled todisclose the Receiving Party receipt. such information. by a third partyentitled to disclose such information. (a) in the public i.) Is publiclyi.) Is publicly known SLI (a) in the public domain at the known at thetime at the time of domain at the time time of receipt of Discloser'sDiscloser' s of receipt by the by the communication to communication toReceiving Party Receiving Party Recipient or Recipient or through nobreach through no thereafter becomes thereafter becomes of thisAgreement breach of this publicly known publicly known by the ReceivingAgreement by through no through no violation Party; (b) the Receivingviolation of this of this Agreement; independently Party; (b) Agreement;ii.) ii.) Was lawfully in developed by or for lawfully Was lawfully inRecipient's the Receiving Party; received by the Recipient's possessionfree of (c) lawfully Receiving Party possession free of any obligationof received by the from a third any obligation of confidence at theReceiving Party party; or (c) confidence at the time of Discloser's froma third party; known by the time of Discloser's communication to or (d)known by the Receiving Party communication to Recipient; iii.) IsReceiving Party at at the time of Recipient; or iii.) Is rightfullyobtained the time of receipt. receipt. rightfully obtained by Recipientfrom a by Recipient from a third party third party authorized to makeauthorized to make such disclosure; or such disclosure. iv.)independently developed by or for the Recipient. (a) in the public (a)is or becomes (a) is or becomes SLI (a) in the public domain at theavailable to the available to the domain at the time time of receiptpublic other than by public other than by of receipt by the by thebreach of this breach of this Receiving Party Receiving Party Agreementby Agreement by through no breach through no Recipient; (b) Recipient;(b) of this Agreement breach of this lawfully received lawfully receivedby the Receiving Agreement by from a third party from a third partyParty; (b) the Receiving without restriction without restrictionindependently Party; (b) on disclosure; (c) on disclosure; (c) developedby or for lawfully disclosed by the disclosed by the the ReceivingParty; received by the Discloser to a third Discloser to a third (c)lawfully Receiving Party party without a party without a received by thefrom a third similar restriction similar restriction on Receiving Partyparty; or (c) on the rights of the rights of such from a third party;known by the such third party; (d) third party; (d) or (d) known by theReceiving Party already known by already known by Receiving Party at atthe time of the Recipient the Recipient the time of receipt. receipt.without breach of without breach of this Agreement; or this Agreement;(e) (e) approved in independently writing by the developed by or forDiscloser for public the Receiving Party; release or or (f) approved indisclosure by the writing by the Recipient. Discloser for public releaseor disclosure by the Recipient.

In embodiments where the edit type comprises a structured list deletion(SLD), the alignment method may compare the SUA to semantically similaroriginal texts. If one of the original texts is labeled with an LII editoperation, then the best location for insertion of the list item isdetermined and the list item is inserted in the SAU to arrive at anESUA. In some embodiments, if a single SUA triggers multiple LIIs,semantically similar LIIs may be clustered together so that multiplesemantically similar LIIs are not applied to the same SUA.

An example of this alignment method for a SLD edit operation is shown intable I below.

TABLE I SUA Original Text Final Text Edit Op. ESUA (a) in the public 4.1prior to its 4.1 prior to its SLD (a) in the public domain at thedisclosure was disclosure was domain at the time time of receiptproperly in properly in of receipt by the by the Receiving Party'sReceiving Party's Receiving Party Receiving Party possession; or 4.2 ispossession; or 4.2 is through no breach of through no in the public inthe public domain this Agreement by breach of this domain through nothrough no fault of the Receiving Party; Agreement by fault of the theReceiving party; or (b) known by the the Receiving Receiving party; oror 4.3 was lawfully Receiving Party at Party; (b) 4.3 was lawfully knownto the the time of receipt. lawfully known to the Receiving Partyreceived by the Receiving Party prior to disclosure. Receiving Partyprior to disclosure; from a third or 4.4 is lawfully party; or (c) madeavailable to known by the the Receiving Party Receiving Party by a thirdparty at the time of entitled to disclose receipt. such information. (a)in the public i.) Is publicly i.) Is publicly known SLD (a) in thepublic domain at the known at the time at the time of domain at the timetime of receipt of Discloser's Discloser' s of receipt by the by thecommunication to communication to Receiving Party Receiving PartyRecipient or Recipient or through no breach through no thereafterbecomes thereafter becomes of this Agreement breach of this publiclyknown publicly known by the Receiving Agreement by through no through noviolation Party; or (b) known the Receiving violation of this of thisAgreement; by the Receiving Party; (b) Agreement; ii.) or ii.) Waslawfully Party at the time of lawfully Was lawfully in in Recipient'sreceipt. received by the Recipient's possession free of Receiving Partypossession free of any obligation of from a third any obligation ofconfidence at the party; or (c) confidence at the time of Discloser'sknown by the time of Discloser's communication to Receiving Partycommunication to Recipient. at the time of Recipient; or iii.) Isreceipt. rightfully obtained by Recipient from a third party authorizedto make such disclosure. (a) in the public (a) is or becomes (a) is orbecomes SLD (a) in the public domain at the available to the availableto the domain at the time time of receipt public other than by publicother than by of receipt by the by the breach of this breach of thisReceiving Party Receiving Party Agreement by Agreement by through nobreach through no Recipient; (b) Recipient; (b) of this Agreement breachof this lawfully received disclosed by the by the Receiving Agreement byfrom a third party Discloser to a third Party; or (b) known theReceiving without restriction party without a by the Receiving Party;(b) on disclosure; (c) similar restriction on Party at the time oflawfully disclosed by the the rights of such receipt. received by theDiscloser to a third third party; (c) Receiving Party party without aalready known by from a third similar restriction the Recipient party;or (c) on the rights of without breach of known by the such third party;(d) this Agreement; or Receiving Party already known by (d) approved inat the time of the Recipient writing by the receipt. without breach ofDiscloser for public this Agreement; or release or disclosure (e)approved in by the Recipient. writing by the Discloser for publicrelease or disclosure by the Recipient.

FIG. 8 is a block diagram illustrating an edit suggestion deviceaccording to some embodiments. In some embodiments, device 800 isapplication server 101. As shown in FIG. 8, device 800 may comprise: adata processing system (DPS) 802, which may include one or moreprocessors 855 (e.g., a general purpose microprocessor and/or one ormore other data processing circuits, such as an application specificintegrated circuit (ASIC), field-programmable gate arrays (FPGAs), andthe like); a network interface 803 for use in connecting device 800 tonetwork 120; and local storage unit (a.k.a., “data storage system”) 806,which may include one or more non-volatile storage devices and/or one ormore volatile storage devices (e.g., random access memory (RAM)). Inembodiments where device 800 includes a general purpose microprocessor,a computer program product (CPP) 833 may be provided. CPP 833 includes acomputer readable medium (CRM) 842 storing a computer program (CP) 843comprising computer readable instructions (CRI) 844. CRM 842 may be anon-transitory computer readable medium, such as, but not limited, tomagnetic media (e.g., a hard disk), optical media (e.g., a DVD), memorydevices (e.g., random access memory), and the like. In some embodiments,the CRI 844 of computer program 843 is configured such that whenexecuted by data processing system 802, the CRI causes the device 800 toperform steps described herein (e.g., steps described above and withreference to the flow charts). In other embodiments, device 800 may beconfigured to perform steps described herein without the need for code.That is, for example, data processing system 802 may consist merely ofone or more ASICs. Hence, the features of the embodiments describedherein may be implemented in hardware and/or software.

FIG. 9 is a method for suggesting revisions to text data, according tosome embodiments. In some embodiments, the method 900 may be performedby edit suggestion device 800 or system 100.

Step 901 comprises obtaining a text under analysis (TUA). In someembodiments, the TUA may be a document-under-analysis (DUA) or a subsetof the DUA, such as a statement-under-analysis (SUA).

Step 903 comprises obtaining a candidate original text from a pluralityof original texts. In some embodiments, step 903 may comprise obtaininga first original text from the seed database for comparison against aSUA as described above in connection with FIG. 3, step 310. As describedabove, different comparisons, or similarity metrics, may be determinedbased on an identified edit type in the first original text.

Step 905 comprises identifying a first edit operation of the candidateoriginal text with respect to a candidate final text associated with thecandidate original text, the first edit operation having an edit-typeclassification. As discussed above, an edit operation may comprise, forexample, a deletion, insertion, or replacement of text data in thecandidate original text as compared to its associated candidate finaltext. The edit-type classification may comprise, for example, a pointedit, span edit, list edit, full edit (e.g., FSI/FSD/FPI/FPD), or achunk edit.

Step 907 comprises selecting an alignment method from a plurality ofalignment methods based on the edit-type classification of the firstedit operation. For example, as described above, different alignmentmethods may be employed based on whether the edit type is a point, span,full, or list edit.

Step 909 comprises identifying a second edit operation based on theselected alignment method. In some embodiments, the second editoperation may be the same as the first edit operation of the candidateoriginal text (e.g., insertion or deletion of the same or semanticallysimilar text).

Step 911 comprises creating an edited TUA (ETUA) by applying to the TUAthe second edit operation.

While various embodiments of the present disclosure are describedherein, it should be understood that they have been presented by way ofexample only, and not limitation. Thus, the breadth and scope of thepresent disclosure should not be limited by any of the above-describedexemplary embodiments. Moreover, any combination of the above-describedelements in all possible variations thereof is encompassed by thedisclosure unless otherwise indicated herein or otherwise clearlycontradicted by context. It will be apparent to those skilled in the artthat various modifications and variations can be made in the method andsystem for suggesting revisions to an electronic document withoutdeparting from the spirit or scope of the invention. Thus, it isintended that embodiments of the invention cover the modifications andvariations of this invention provided they come within the scope of theappended claims and their equivalents.

Additionally, while the processes described above and illustrated in thedrawings are shown as a sequence of steps, this was done solely for thesake of illustration. Accordingly, it is contemplated that some stepsmay be added, some steps may be omitted, the order of the steps may bere-arranged, and some steps may be performed in parallel.

1. A method for suggesting revisions to text data, the methodcomprising: obtaining a text-under-analysis (“TUA”); obtaining anoriginal text from a plurality of original texts; identifying an editoperation of the original text with respect to a final text associatedwith the original text, the edit operation having an edit-typeclassification; selecting a similarity scoring metric from a pluralityof similarity scoring metrics based on the edit-type classification;generating a similarity score for the original text using the selectedsimilarity scoring metric, the similarity score representing a degree ofsimilarity between the TUA and the original text; and creating an editedTUA (“ETUA”) by modifying the TUA consistent with the final textassociated with the original text.